Weakly-Supervised Action Localization, and Action Recognition Using Global–Local Attention of 3D CNN
نویسندگان
چکیده
3D convolutional neural network (3D CNN) captures spatial and temporal information on data such as video sequences. However, due to the convolution pooling mechanism, loss that occurs seems unavoidable. To improve visual explanations classification in CNN, we propose two approaches; (i) aggregate layer-wise global local (global–local) discrete gradient using trained 3DResNext network, (ii) implement attention gating accuracy of action recognition. The proposed approach intends show usefulness every layer termed global–local CNN via attribution, weakly-supervised localization, Firstly, is applied for backpropagation concerning maximum predicted class. activation are then up-sampled. Later, aggregation used produce more nuanced attention, which points out most critical part class’s input videos. We use contour thresholding final localization. evaluate localization trimmed videos fine-grained explanation 3DCAM. Experimental results produces informative discriminative attention. Furthermore, recognition each better than baseline model.
منابع مشابه
Weakly Supervised Action Recognition and Localization Using Web Images
This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and timeconsuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. ...
متن کاملTowards Weakly-Supervised Action Localization
This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...
متن کاملAction Recognition by Weakly-Supervised Discriminative Region Localization
We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label without any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist ca...
متن کاملWeakly Supervised Action Detection
Detection of human action in videos has many applications such as video surveillance and content based video retrieval. Actions can be considered as spatio-temporal objects corresponding to spatio-temporal volumes in a video. The problem of action detection can thus be solved similarly to object detection in 2D images [3] where typically an object classifier is trained using positive and negati...
متن کاملWeakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video usin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Vision
سال: 2022
ISSN: ['0920-5691', '1573-1405']
DOI: https://doi.org/10.1007/s11263-022-01649-x